Gaming API Design Evaluation and Latency Budget
Let's discuss how our design achieves non-functional requirements and the lowest possible latency.
Introduction#
This lesson covers optimizations and tradeoffs to meet the proposed non-functional requirements. We also discuss response time, which is a key factor in our service's efficiency. Let's see how we meet the non-functional requirements, especially when dealing with global player interactions.
Non-functional requirements#
We are achieving the non-functional requirements of the game API as follows in the sections below.
Availability#
By separating gameplay and CRUD operations on game assets, we ensure service decoupling and improve service availability. Our solution ensures that game configurations are on healthy servers through regular checks and backups. In the unlikely event of a server failure, we can quickly rebuild the session from the most recent backup. Furthermore, we can cap the maximum number of players who can join a game to manage resources efficiently.
Scalability#
Regionally distributed clusters make scaling services easier. This is a scalable and cost-effective approach because the cluster controller manages and handles multiple game servers and adding and removing game servers is much easier. Asynchronous communication between the cluster controller and game services enables efficient resource management and load balancing during periods of increased traffic.
Security#
We implement authentication/authorization by using a login mechanism. Joining a game lobby requires a JWT access token, which is only issued when there is enough space for players or teams willing to join the game. The JWT token also helps identify the user and the user's privileges in the game. When a game ends, the game server shares game stat updates with the game service via a private cloud to avoid any data manipulation. We also have a patch controller for security and version updates, ensuring that every client joining the game has installed the services' critical updates. To prevent in-game hacking and cheating, each client periodically synchronizes its state with the server. Furthermore, our API validates state changes on the server before merging and broadcasting to other players.
Low latency#
We achieve low latency by sending as little data (state changes) as possible over the wire. For global tournaments, we replicate contestant data (weapon appearance, avatars, and other defaults) across regionally distributed clusters so we don’t have to send this information across regions during the gameplay. We implement lag compensation and client move prediction to reduce the user-perceived latency and improve the gameplay experience. We also buffer and prefetch a small data threshold before showing it to other players. Additionally, data is transferred over a dedicated VPC channel with preallocated bandwidth and is specifically configured for high-priority requests to keep latency as low as possible.
Achieving Non-Functional Requirements
Non-Functional Requirements | Approaches |
Availability |
|
Scalability |
|
Security |
|
Low latency |
|
Latency budget#
Let's estimate the latency of the following three main operations performed by our game API.
Note: As discussed in the back-of-the-envelope calculations for latency, for
POSTrequests, thetime changes with the data size by per KB after the base RTT time—the minimum RTT taken by a request with smallest data size—which was 260 ms. Moreover, the time to download the response to a request varies by per KB.
Initiate game session#
Clients send POST requests to initiate a game session. Let's estimate the request and response sizes to calculate the response time for this request.
Request and response size#
We assume that a payload size of 11 KB is attached to the request, which contains client-device information such as the installed game version, playerId, gameMode, playerDefaults, and other game settings. The response returned by the server is estimated to be roughly 7 KB and contains the joinURL server, the server's exposed port, matchId, roomId, the hash identifier assigned to the player, and so on. Let's use these sizes to calculate the response time in the next section.
Response time#
Let's use the calculator below to estimate the response time for initiating the game session on a game server.
Response Time Calculator for the Initiation of a Game Session
| Enter request size in KBs | 11 | KB |
| Enter response size in KBs | 7 | KB |
| Minimum latency | f395.95 | ms |
| Maximum latency | f476.95 | ms |
| Minimum response time | f399.95 | ms |
| Maximum response time | f480.95 | ms |
The calculations above are made assuming that we have a POST request of size 11 KB and a response of size 7 KB.
Similarly:
To calculate the response time:
Similarly:
Note: Although the above numbers may seem high for a real-time communication application like a game, it should be noted that these requests are made before the game starts, and latency is not critical for such initial requests. Once the actual game starts, it's important to keep latency to a minimum to ensure a smooth gaming experience.
Start playing#
In order to join a gaming room, clients send GET requests. Let's see how the operation is carried out and how long it takes to finish.
Request and response size#
We know from the previous lesson that the actual gameplay is performed using the WebSocket protocol. Since the upgrade to WebSocket doesn’t require a body in the request and response messages, we assume the messages to be 1 KB in size, including header fields such as host, authentication, status code, and so on.
Response time#
For an HTTP GET request, calculated with a processing time of 4 ms, a base time of 201.5 ms, and a round trip time of 70 ms, we get a switching protocol response time of 275.9 ms, as shown below:
Send events#
Let's calculate the latency between clients and the game server when sending and receiving events.
Message size#
We use binary format to represent event information such as moves or actions performed by the player. Therefore, the sent data would be in the range of a few hundred bytes (say 300 bytes.)
The size of the incoming message depends on the changes made by different players in that game to the global game state maintained on the server. Let's roughly calculate the overall state change by taking the maximum number of players allowed in the same room (100) and assuming they all make a similar change of 300 bytes.
Response time#
We may disregard elements like base time, request compilation time, etc., because we already know the connection has been formed and upgraded to WebSockets. Therefore, the following formula can be used to calculate the latency for transmitting event messages:
Similarly:
Let's use the following calculator to add the processing time taken by the game server to sync up all the states:
Response Time Calculator for Sending and Receiving Game State
| Enter message size in KBs | 0.3 | KB |
| Total number of players | 100 | Integer |
| Estimated outgoing event latency | f35.12 | ms |
| Estimated incoming event latency | f39.3344 | ms |
| User-perceived response time | f78.45439999999999 | ms |
The following illustration summarizes the overall response time required to exchange game events:
Note: A latency of 50–100 ms is considered adequate for gaming. The response time calculated using the calculator above is the average response of servers located in different parts of the world and is adequate.
Optimization and tradeoffs#
There is a tight coupling between the client and the server that validates the game state during a session. For a set of clients belonging to the same region, it may be plausible to provide a smoother experience. However, as we scale, it becomes a challenge to provide the same level of experience to clients with varying configurations, networks, and devices. This calls for some optimization techniques, which we list below:
Dedicating resources: We can allocate resources for special events to ensure low ping and consistent response times. For example, a dedicated portion of the VPC bandwidth is provided for a global tournament while regular game mode can utilize the remaining bandwidth until the tournament ends. This is a tradeoff that highly depends on the business strategy of the organization.
UDP-based communication: TCP provides reliable communication, ensuring no data is lost during transmission. However, it can cause high latency when packets are lost or go out of order. For such cases, we can use UDP-based communication to reduce latency where small data losses can be tolerated. We can even opt for protocols like HTTP/3 or RUDP, which are based on UDP but provide reliable communication, but their use is not yet standardized. However, their use is still common in the gaming industry, even though these protocols lack features such as prefetching, buffering, and other performance-enhancing techniques. So it is a tradeoff between latency and added complexity to make UDP reliable.
Note: Applications may also add ordering and selective reliability to UDP on the application layer.
Summary#
In this chapter, we learned how we can design the API of a gaming system by first highlighting its requirements. Based on the established requirements, we designed an end-to-end communication underlining the important decisions. Next, we focused on the unique endpoints for the gaming API. Finally, we learned how our API implements non-functional requirements, such as addressing client-side cheats/hacks, and real-time communication.
API Model for Gaming Service
What Causes API Failures